对于医疗保健提供者提供适当的患者护理的准确和详细说明,包括患者时​​间表中的药物变化,至关重要。医疗保健提供者或患者本身可能会引发患者药物的改变。用药更改采用多种形式,包括处方药和相关剂量修饰。这些更改提供了有关患者整体健康以及导致当前护理的理由的信息。然后,未来的护理可以基于患者的最终状态。这项工作探讨了从自由文本临床注释中自动提取药物变化信息。上下文药物事件数据集(CMED)是临床注释的语料库,其注释可以通过多种变化相关的属性来表征药物变化,包括更改的类型(启动,停止,增加等),更改,时间性,时间性,时间性,时间性,时间性,时间。改变可能性和否定。使用CMED,我们确定了临床文本中的药物提及,并提出了三个新型的基于BERT的新型基于BERT的系统,以解决注释的药物变化特征。我们证明,我们建议的体系结构改善了对CMED的初始工作改善药物变更分类的性能。我们确定了0.959 F1的高性能的药物提及,我们提出的系统将药物变化及其属性分类为0.827 F1。
translated by 谷歌翻译
已知尖峰神经网络(SNN)对于神经形态处理器实施非常有效,可以在传统深度学习方法上提高能效和计算潜伏期的数量级。最近,随着监督培训算法对SNN的背景,最近也使可比的算法性能成为可能。但是,包括音频,视频和其他传感器衍生数据在内的信息通常被编码为不适合SNN的实用值信号,从而阻止网络利用SPIKE定时信息。因此,从实价信号到尖峰的有效编码是至关重要的,并且会显着影响整个系统的性能。为了有效地将信号编码为尖峰,必须考虑与手头任务相关的信息以及编码尖峰的密度。在本文中,我们在扬声器独立数字分类系统的背景下研究了四种尖峰编码方法:发送三角洲,第一次尖峰的时间,漏水的集成和火神经元和弯曲尖刺算法。我们首先表明,与传统的短期傅立叶变换相比,在编码生物启发的耳蜗时,使用较少的尖峰会产生更高的分类精度。然后,我们证明了两种对三角洲变体的发送导致分类结果可与最先进的深卷积神经网络基线相媲美,同时降低了编码的比特率。最后,我们表明,几种编码方法在某些情况下导致比传统深度学习基线的性能提高,进一步证明了编码实用值信号中编码算法的尖峰力量艺术技术。
translated by 谷歌翻译
视频标题的当前度量主要基于参考和候选字幕之间的文本级别比较。然而,它们具有一些不可能的缺点,例如,它们不能在没有参考的情况下处理视频,并且由于视频到文本的一对多性质和忽视视觉相关性的一对多性质,它们可能导致偏见的评估。从人类评估者的观点来看,高质量的标题应与提供的视频一致,但不一定类似于文字或语义中的参考。灵感来自人类评估,我们提出了Emscore(基于匹配的分数),是视频字幕的一种新颖的无参考度量,其直接测量视频和候选字幕之间的相似性。受益于最近的大规模预训练模型的发展,我们利用了一个良好的预先训练的视觉语言模型来提取用于计算Emscore的视觉和语言嵌入。具体地,Emscore将粗粒(视频和标题)和细粒度(帧和单词)水平的匹配分数组合,这将考虑到视频的整体理解和详细特征。此外,考虑到潜在的信息增益,Emscore可以灵活地扩展到人类标记的参考可用的条件。最后但并非最不重要的是,我们收集Vatex-eval和ActivityNet-Foil数据集以系统地评估现有的度量标准。 Vatex-emp实验表明,Emscore具有更高的人类相关性和较低的参考依赖性。 ActivityNet-Foil实验验证Emscore可以有效地识别“幻觉”标题。将释放数据集以促进视频标题度量的开发。代码可在:https://github.com/shiyaya/emcore。
translated by 谷歌翻译
对抗攻击使他们的成功取得了“愚弄”DNN等,基于梯度的算法成为一个主流。基于线性假设[12],在$ \ ell_ \ infty $约束下,在梯度上应用于渐变的$符号$操作是生成扰动的良好选择。然而,存在来自这种操作的副作用,因为它导致真实梯度与扰动之间的方向偏差。换句话说,当前方法包含真实梯度和实际噪声之间的间隙,这导致偏置和低效的攻击。因此,在理论上,基于泰勒膨胀,偏差地分析了$ \符号$,即快速梯度非符号法(FGNM)的校正。值得注意的是,FGNM是一般例程,它可以在基于梯度的攻击中无缝地更换传统的$符号$操作,以可忽略的额外计算成本。广泛的实验证明了我们方法的有效性。具体来说,我们的大多数和\ textBF {27.5 \%}平均突出了它们,平均而言。我们的匿名代码是公开可用的:\ url {https://git.io/mm -fgnm}。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译